Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 64
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38600667

RESUMO

Human leukocyte antigen (HLA) recognizes foreign threats and triggers immune responses by presenting peptides to T cells. Computationally modeling the binding patterns between peptide and HLA is very important for the development of tumor vaccines. However, it is still a big challenge to accurately predict HLA molecules binding peptides. In this paper, we develop a new model TripHLApan for predicting HLA molecules binding peptides by integrating triple coding matrix, BiGRU + Attention models, and transfer learning strategy. We have found the main interaction site regions between HLA molecules and peptides, as well as the correlation between HLA encoding and binding motifs. Based on the discovery, we make the preprocessing and coding closer to the natural biological process. Besides, due to the input being based on multiple types of features and the attention module focused on the BiGRU hidden layer, TripHLApan has learned more sequence level binding information. The application of transfer learning strategies ensures the accuracy of prediction results under special lengths (peptides in length 8) and model scalability with the data explosion. Compared with the current optimal models, TripHLApan exhibits strong predictive performance in various prediction environments with different positive and negative sample ratios. In addition, we validate the superiority and scalability of TripHLApan's predictive performance using additional latest data sets, ablation experiments and binding reconstitution ability in the samples of a melanoma patient. The results show that TripHLApan is a powerful tool for predicting the binding of HLA-I and HLA-II molecular peptides for the synthesis of tumor vaccines. TripHLApan is publicly available at https://github.com/CSUBioGroup/TripHLApan.git.


Assuntos
Vacinas Anticâncer , Humanos , Ligação Proteica , Peptídeos/química , Antígenos HLA/química , Antígenos de Histocompatibilidade Classe II/química , Antígenos de Histocompatibilidade Classe I/química , Aprendizado de Máquina
2.
Commun Biol ; 6(1): 870, 2023 08 24.
Artigo em Inglês | MEDLINE | ID: mdl-37620651

RESUMO

Adverse Drug Reactions (ADRs) have a direct impact on human health. As continuous pharmacovigilance and drug monitoring prove to be costly and time-consuming, computational methods have emerged as promising alternatives. However, most existing computational methods primarily focus on predicting whether or not the drug is associated with an adverse reaction and do not consider the core issue of drug benefit-risk assessment-whether the treatment outcome is serious when adverse drug reactions occur. To this end, we categorize serious clinical outcomes caused by adverse reactions to drugs into seven distinct classes and present a deep learning framework, so-called GCAP, for predicting the seriousness of clinical outcomes of adverse reactions to drugs. GCAP has two tasks: one is to predict whether adverse reactions to drugs cause serious clinical outcomes, and the other is to infer the corresponding classes of serious clinical outcomes. Experimental results demonstrate that our method is a powerful and robust framework with high extendibility. GCAP can serve as a useful tool to successfully address the challenge of predicting the seriousness of clinical outcomes stemming from adverse reactions to drugs.


Assuntos
Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/diagnóstico , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/etiologia , Pâncreas
3.
Bioinformatics ; 39(9)2023 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-37606993

RESUMO

MOTIVATION: Cancer heterogeneity drastically affects cancer therapeutic outcomes. Predicting drug response in vitro is expected to help formulate personalized therapy regimens. In recent years, several computational models based on machine learning and deep learning have been proposed to predict drug response in vitro. However, most of these methods capture drug features based on a single drug description (e.g. drug structure), without considering the relationships between drugs and biological entities (e.g. target, diseases, and side effects). Moreover, most of these methods collect features separately for drugs and cell lines but fail to consider the pairwise interactions between drugs and cell lines. RESULTS: In this paper, we propose a deep learning framework, named MSDRP for drug response prediction. MSDRP uses an interaction module to capture interactions between drugs and cell lines, and integrates multiple associations/interactions between drugs and biological entities through similarity network fusion algorithms, outperforming some state-of-the-art models in all performance measures for all experiments. The experimental results of de novo test and independent test demonstrate the excellent performance of our model for new drugs. Furthermore, several case studies illustrate the rationality for using feature vectors derived from drug similarity matrices from multisource data to represent drugs and the interpretability of our model. AVAILABILITY AND IMPLEMENTATION: The codes of MSDRP are available at https://github.com/xyzhang-10/MSDRP.


Assuntos
Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Algoritmos , Linhagem Celular , Aprendizado de Máquina
4.
Bioinformatics ; 39(39 Suppl 1): i368-i376, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37387178

RESUMO

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) offers a powerful tool to dissect the complexity of biological tissues through cell sub-population identification in combination with clustering approaches. Feature selection is a critical step for improving the accuracy and interpretability of single-cell clustering. Existing feature selection methods underutilize the discriminatory potential of genes across distinct cell types. We hypothesize that incorporating such information could further boost the performance of single cell clustering. RESULTS: We develop CellBRF, a feature selection method that considers genes' relevance to cell types for single-cell clustering. The key idea is to identify genes that are most important for discriminating cell types through random forests guided by predicted cell labels. Moreover, it proposes a class balancing strategy to mitigate the impact of unbalanced cell type distributions on feature importance evaluation. We benchmark CellBRF on 33 scRNA-seq datasets representing diverse biological scenarios and demonstrate that it substantially outperforms state-of-the-art feature selection methods in terms of clustering accuracy and cell neighborhood consistency. Furthermore, we demonstrate the outstanding performance of our selected features through three case studies on cell differentiation stage identification, non-malignant cell subtype identification, and rare cell identification. CellBRF provides a new and effective tool to boost single-cell clustering accuracy. AVAILABILITY AND IMPLEMENTATION: All source codes of CellBRF are freely available at https://github.com/xuyp-csu/CellBRF.


Assuntos
Benchmarking , Algoritmo Florestas Aleatórias , Diferenciação Celular , Análise por Conglomerados
5.
J Biomed Inform ; 143: 104396, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37211195

RESUMO

Automated ICD coding is a multi-label prediction task aiming at assigning patient diagnoses with the most relevant subsets of disease codes. In the deep learning regime, recent works have suffered from large label set and heavy imbalance distribution. To mitigate the negative effect in such scenarios, we propose a retrieve and rerank framework that introduces the Contrastive Learning (CL) for label retrieval, allowing the model to make more accurate prediction from a simplified label space. Given the appealing discriminative power of CL, we adopt it as the training strategy to replace the standard cross-entropy objective and retrieve a small subset by taking the distance between clinical notes and ICD codes into account. After properly training, the retriever could implicitly capture the code co-occurrence, which makes up for the deficiency of cross-entropy assigning each label independently of the others. Further, we evolve a powerful model via a Transformer variant for refining and reranking the candidate set, which can extract semantically meaningful features from long clinical sequences. Applying our method on well-known models, experiments show that our framework provides more accurate results guaranteed by preselecting a small subset of candidates before fine-level reranking. Relying on the framework, our proposed model achieves 0.590 and 0.990 in terms of Micro-F1 and Micro-AUC on benchmark MIMIC-III.


Assuntos
Registros Eletrônicos de Saúde , Classificação Internacional de Doenças , Humanos
6.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37084258

RESUMO

MOTIVATION: Hi-C technology has been the most widely used chromosome conformation capture (3C) experiment that measures the frequency of all paired interactions in the entire genome, which is a powerful tool for studying the 3D structure of the genome. The fineness of the constructed genome structure depends on the resolution of Hi-C data. However, due to the fact that high-resolution Hi-C data require deep sequencing and thus high experimental cost, most available Hi-C data are in low-resolution. Hence, it is essential to enhance the quality of Hi-C data by developing the effective computational methods. RESULTS: In this work, we propose a novel method, so-called DFHiC, which generates the high-resolution Hi-C matrix from the low-resolution Hi-C matrix in the framework of the dilated convolutional neural network. The dilated convolution is able to effectively explore the global patterns in the overall Hi-C matrix by taking advantage of the information of the Hi-C matrix in a way of the longer genomic distance. Consequently, DFHiC can improve the resolution of the Hi-C matrix reliably and accurately. More importantly, the super-resolution Hi-C data enhanced by DFHiC is more in line with the real high-resolution Hi-C data than those done by the other existing methods, in terms of both chromatin significant interactions and identifying topologically associating domains. AVAILABILITY AND IMPLEMENTATION: https://github.com/BinWangCSU/DFHiC.


Assuntos
Cromatina , Cromossomos , Cromatina/genética , Genoma , Genômica , Redes Neurais de Computação
7.
Artigo em Inglês | MEDLINE | ID: mdl-35471889

RESUMO

The identification of drug-target relations (DTRs) is substantial in drug development. A large number of methods treat DTRs as drug-target interactions (DTIs), a binary classification problem. The main drawback of these methods are the lack of reliable negative samples and the absence of many important aspects of DTR, including their dose dependence and quantitative affinities. With increasing number of publications of drug-protein binding affinity data recently, DTRs prediction can be viewed as a regression problem of drug-target affinities (DTAs) which reflects how tightly the drug binds to the target and can present more detailed and specific information than DTIs. The growth of affinity data enables the use of deep learning architectures, which have been shown to be among the state-of-the-art methods in binding affinity prediction. Although relatively effective, due to the black-box nature of deep learning, these models are less biologically interpretable. In this study, we proposed a deep learning-based model, named AttentionDTA, which uses attention mechanism to predict DTAs. Different from the models using 3D structures of drug-target complexes or graph representation of drugs and proteins, the novelty of our work is to use attention mechanism to focus on key subsequences which are important in drug and protein sequences when predicting its affinity. We use two separate one-dimensional Convolution Neural Networks (1D-CNNs) to extract the semantic information of drug's SMILES string and protein's amino acid sequence. Furthermore, a two-side multi-head attention mechanism is developed and embedded to our model to explore the relationship between drug features and protein features. We evaluate our model on three established DTA benchmark datasets, Davis, Metz, and KIBA. AttentionDTA outperforms the state-of-the-art deep learning methods under different evaluation metrics. The results show that the attention-based model can effectively extract protein features related to drug information and drug features related to protein information to better predict drug target affinities. It is worth mentioning that we test our model on IC50 dataset, which provides the binding sites between drugs and proteins, to evaluate the ability of our model to locate binding sites. Finally, we visualize the attention weight to demonstrate the biological significance of the model. The source code of AttentionDTA can be downloaded from https://github.com/zhaoqichang/AttentionDTA_TCBB.


Assuntos
Aprendizado Profundo , Desenvolvimento de Medicamentos , Sítios de Ligação , Sequência de Aminoácidos , Benchmarking
8.
Artigo em Inglês | MEDLINE | ID: mdl-35476573

RESUMO

The understanding of protein functions is critical to many biological problems such as the development of new drugs and new crops. To reduce the huge gap between the increase of protein sequences and annotations of protein functions, many methods have been proposed to deal with this problem. These methods use Gene Ontology (GO) to classify the functions of proteins and consider one GO term as a class label. However, they ignore the co-occurrence of GO terms that is helpful for protein function prediction. We propose a new deep learning model, named DeepPFP-CO, which uses Graph Convolutional Network (GCN) to explore and capture the co-occurrence of GO terms to improve the protein function prediction performance. In this way, we can further deduce the protein functions by fusing the predicted propensity of the center function and its co-occurrence functions. We use Fmax and AUPR to evaluate the performance of DeepPFP-CO and compare DeepPFP-CO with state-of-the-art methods such as DeepGOPlus and DeepGOA. The computational results show that DeepPFP-CO outperforms DeepGOPlus and other methods. Moreover, we further analyze our model at the protein level. The results have demonstrated that DeepPFP-CO improves the performance of protein function prediction. DeepPFP-CO is available at https://csuligroup.com/DeepPFP/.


Assuntos
Aprendizado Profundo , Ontologia Genética , Proteínas/genética , Sequência de Aminoácidos
9.
Artigo em Inglês | MEDLINE | ID: mdl-35104223

RESUMO

Topologically associating domains (TADs) are local chromatin interaction domains, which have been shown to play an important role in gene expression regulation. TADs were originally discovered in the investigation of 3D genome organization based on High-throughput Chromosome Conformation Capture (Hi-C) data. Continuous considerable efforts have been dedicated to developing methods for detecting TADs from Hi-C data. Different computational methods for TADs identification vary in their assumptions and criteria in calling TADs. As a consequence, the TADs called by these methods differ in their similarities and biological features they are enriched in. In this work, we performed a systematic comparison of twenty-six TAD callers. We first compared the TADs and gaps between adjacent TADs across different methods, resolutions, and sequencing depths. We then assessed the quality of TADs and TAD boundaries according to three criteria: the decay of contact frequencies over the genomic distance, enrichment and depletion of regulatory elements around TAD boundaries, and reproducibility of TADs and TAD boundaries in replicate samples. Last, due to the lack of a gold standard of TADs, we also evaluated the performance of the methods on synthetic datasets. We discussed the key principles of TAD callers, and pinpointed current situation in the detection of TADs. We provide a concise, comprehensive, and systematic framework for evaluating the performance of TAD callers, and expect our work will provide useful guidance in choosing suitable approaches for the detection and evaluation of TADs.


Assuntos
Cromatina , Cromossomos , Reprodutibilidade dos Testes , Cromatina/genética , Cromossomos/genética , Genoma , Regulação da Expressão Gênica
10.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2712-2723, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-34110998

RESUMO

The Anatomical Therapeutic Chemical (ATC) classification system, designated by the World Health Organization Collaborating Center (WHOCC), has been widely used in drug screening, repositioning, and similarity research. The ATC classification system assigns different codes to drugs according to the organ or system on which they act and/or their therapeutic and chemical characteristics. Correctly identifying the potential ATC codes for drugs can accelerate drug development and reduce the cost of experiments. Several classifiers have been proposed in this regard. However, they lack of ability to learn basic features from sparsely known drug-ATC code associations. Therefore, there is an urgent need for novel computational methods to precisely predict potential drug-ATC code associations in multiple levels of the ATC classification system based on known associations between drugs and ATC codes. In this paper, we provide a novel end-to-end model, so-called RNPredATC, to predict potential drug-ATC code associations in five ATC classification levels. RNPredATC can extract dense feature vectors from sparsely known drug-ATC code associations and reduce the impact from the degradation problem by a novel deep residual learning. We extensively compare our method with some state-of-the-art methods, including NetPredATC, SPACE, and some multi-label-based methods. Our experimental results show that RNPredATC achieves better performances in five-fold and ten-fold cross validations. Furthermore, the visualization analysis of hidden layers and case studies of predicted associations at the fifth ATC classification level confirm that RNPredATC can effectively identify the potential ATC codes of drugs.

11.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 1943-1952, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36445997

RESUMO

Drug discovery and drug repurposing often rely on the successful prediction of drug-target interactions (DTIs). Recent advances have shown great promise in applying deep learning to drug-target interaction prediction. One challenge in building deep learning-based models is to adequately represent drugs and proteins that encompass the fundamental local chemical environments and long-distance information among amino acids of proteins (or atoms of drugs). Another challenge is to efficiently model the intermolecular interactions between drugs and proteins, which plays vital roles in the DTIs. To this end, we propose a novel model, GIFDTI, which consists of three key components: the sequence feature extractor (CNNFormer), the global molecular feature extractor (GF), and the intermolecular interaction modeling module (IIF). Specifically, CNNFormer incorporates CNN and Transformer to capture the local patterns and encode the long-distance relationship among tokens (atoms or amino acids) in a sequence. Then, GF and IIF extract the global molecular features and the intermolecular interaction features, respectively. We evaluate GIFDTI on six realistic evaluation strategies and the results show it improves DTI prediction performance compared to state-of-the-art methods. Moreover, case studies confirm that our model can be a useful tool to accurately yield low-cost DTIs. The codes of GIFDTI are available at https://github.com/zhaoqichang/GIFDTI.


Assuntos
Desenvolvimento de Medicamentos , Proteínas , Proteínas/química , Desenvolvimento de Medicamentos/métodos , Descoberta de Drogas/métodos , Reposicionamento de Medicamentos , Aminoácidos
12.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36511222

RESUMO

Circular RNAs (circRNAs) are reverse-spliced and covalently closed RNAs. Their interactions with RNA-binding proteins (RBPs) have multiple effects on the progress of many diseases. Some computational methods are proposed to identify RBP binding sites on circRNAs but suffer from insufficient accuracy, robustness and explanation. In this study, we first take the characteristics of both RNA and RBP into consideration. We propose a method for discriminating circRNA-RBP binding sites based on multi-scale characterizing sequence and structure features, called CRMSS. For circRNAs, we use sequence ${k}\hbox{-}{mer}$ embedding and the forming probabilities of local secondary structures as features. For RBPs, we combine sequence and structure frequencies of RNA-binding domain regions to generate features. We capture binding patterns with multi-scale residual blocks. With BiLSTM and attention mechanism, we obtain the contextual information of high-level representation for circRNA-RBP binding. To validate the effectiveness of CRMSS, we compare its predictive performance with other methods on 37 RBPs. Taking the properties of both circRNAs and RBPs into account, CRMSS achieves superior performance over state-of-the-art methods. In the case study, our model provides reliable predictions and correctly identifies experimentally verified circRNA-RBP pairs. The code of CRMSS is freely available at https://github.com/BioinformaticsCSU/CRMSS.


Assuntos
RNA Circular , RNA , RNA Circular/genética , Sítios de Ligação , RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo
13.
IEEE J Biomed Health Inform ; 26(10): 5201-5212, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-35867367

RESUMO

Automatic International Classification of Diseases (ICD) coding is defined as a kind of text multi-label classification problem, which is difficult because the number of labels is very large and the distribution of labels is unbalanced. The label-wise attention mechanism is widely used in automatic ICD coding because it can assign weights to every word in full Electronic Medical Records (EMR) for different ICD codes. However, the label-wise attention mechanism is redundant and costly in computing. In this paper, we propose a pseudo label-wise attention mechanism to tackle the problem. Instead of computing different attention modes for different ICD codes, the pseudo label-wise attention mechanism automatically merges similar ICD codes and computes only one attention mode for the similar ICD codes, which greatly compresses the number of attention modes and improves the predicted accuracy. In addition, we apply a more convenient and effective way to obtain the ICD vectors, and thus our model can predict new ICD codes by calculating the similarities between EMR vectors and ICD vectors. Our model demonstrates effectiveness in extensive computational experiments. On the public MIMIC-III dataset and private Xiangya dataset, our model achieves the best performance on micro F1 (0.583 and 0.806), micro AUC (0.986 and 0.994), P@8 (0.756 and 0.413), and costs much smaller GPU memory (about 26.1% of the models with label-wise attention). Furthermore, we verify the ability of our model in predicting new ICD codes. The interpretablility analysis and case study show the effectiveness and reliability of the patterns obtained by the pseudo label-wise attention mechanism.


Assuntos
Registros Eletrônicos de Saúde , Classificação Internacional de Doenças , Codificação Clínica , Humanos , Reprodutibilidade dos Testes
14.
Bioinformatics ; 38(17): 4153-4161, 2022 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-35801934

RESUMO

MOTIVATION: Identifying drug-target interactions is a crucial step for drug discovery and design. Traditional biochemical experiments are credible to accurately validate drug-target interactions. However, they are also extremely laborious, time-consuming and expensive. With the collection of more validated biomedical data and the advancement of computing technology, the computational methods based on chemogenomics gradually attract more attention, which guide the experimental verifications. RESULTS: In this study, we propose an end-to-end deep learning-based method named IIFDTI to predict drug-target interactions (DTIs) based on independent features of drug-target pairs and interactive features of their substructures. First, the interactive features of substructures between drugs and targets are extracted by the bidirectional encoder-decoder architecture. The independent features of drugs and targets are extracted by the graph neural networks and convolutional neural networks, respectively. Then, all extracted features are fused and inputted into fully connected dense layers in downstream tasks for predicting DTIs. IIFDTI takes into account the independent features of drugs/targets and simulates the interactive features of the substructures from the biological perspective. Multiple experiments show that IIFDTI outperforms the state-of-the-art methods in terms of the area under the receiver operating characteristics curve (AUC), the area under the precision-recall curve (AUPR), precision, and recall on benchmark datasets. In addition, the mapped visualizations of attention weights indicate that IIFDTI has learned the biological knowledge insights, and two case studies illustrate the capabilities of IIFDTI in practical applications. AVAILABILITY AND IMPLEMENTATION: The data and codes underlying this article are available in Github at https://github.com/czjczj/IIFDTI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Descoberta de Drogas , Redes Neurais de Computação , Interações Medicamentosas , Área Sob a Curva , Descoberta de Drogas/métodos , Curva ROC
15.
Bioinformatics ; 38(7): 1995-2002, 2022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35043942

RESUMO

MOTIVATION: The identification of compound-protein interactions (CPIs) is an essential step in the process of drug discovery. The experimental determination of CPIs is known for a large amount of funds and time it consumes. Computational model has therefore become a promising and efficient alternative for predicting novel interactions between compounds and proteins on a large scale. Most supervised machine learning prediction models are approached as a binary classification problem, which aim to predict whether there is an interaction between the compound and the protein or not. However, CPI is not a simple binary on-off relationship, but a continuous value reflects how tightly the compound binds to a particular target protein, also called binding affinity. RESULTS: In this study, we propose an end-to-end neural network model, called BACPI, to predict CPI and binding affinity. We employ graph attention network and convolutional neural network (CNN) to learn the representations of compounds and proteins and develop a bi-directional attention neural network model to integrate the representations. To evaluate the performance of BACPI, we use three CPI datasets and four binding affinity datasets in our experiments. The results show that, when predicting CPIs, BACPI significantly outperforms other available machine learning methods on both balanced and unbalanced datasets. This suggests that the end-to-end neural network model that predicts CPIs directly from low-level representations is more robust than traditional machine learning-based methods. And when predicting binding affinities, BACPI achieves higher performance on large datasets compared to other state-of-the-art deep learning methods. This comparison result suggests that the proposed method with bi-directional attention neural network can capture the important regions of compounds and proteins for binding affinity prediction. AVAILABILITY AND IMPLEMENTATION: Data and source codes are available at https://github.com/CSUBioGroup/BACPI.


Assuntos
Redes Neurais de Computação , Software , Proteínas/química , Aprendizado de Máquina , Descoberta de Drogas/métodos
16.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3263-3271, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34699365

RESUMO

Essential proteins are considered the foundation of life as they are indispensable for the survival of living organisms. Computational methods for essential protein discovery provide a fast way to identify essential proteins. But most of them heavily rely on various biological information, especially protein-protein interaction networks, which limits their practical applications. With the rapid development of high-throughput sequencing technology, sequencing data has become the most accessible biological data. However, using only protein sequence information to predict essential proteins has limited accuracy. In this paper, we propose EP-EDL, an ensemble deep learning model using only protein sequence information to predict human essential proteins. EP-EDL integrates multiple classifiers to alleviate the class imbalance problem and to improve prediction accuracy and robustness. In each base classifier, we employ multi-scale text convolutional neural networks to extract useful features from protein sequence feature matrices with evolutionary information. Our computational results show that EP-EDL outperforms the state-of-the-art sequence-based methods. Furthermore, EP-EDL provides a more practical and flexible way for biologists to accurately predict essential proteins. The source code and datasets can be downloaded from https://github.com/CSUBioGroup/EP-EDL.


Assuntos
Aprendizado Profundo , Humanos , Redes Neurais de Computação , Proteínas/genética , Sequência de Aminoácidos , Software
17.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 2092-2110, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-33769935

RESUMO

The identification of compound-protein relations (CPRs), which includes compound-protein interactions (CPIs) and compound-protein affinities (CPAs), is critical to drug development. A common method for compound-protein relation identification is the use of in vitro screening experiments. However, the number of compounds and proteins is massive, and in vitro screening experiments are labor-intensive, expensive, and time-consuming with high failure rates. Researchers have developed a computational field called virtual screening (VS) to aid experimental drug development. These methods utilize experimentally validated biological interaction information to generate datasets and use the physicochemical and structural properties of compounds and target proteins as input information to train computational prediction models. At present, deep learning has been widely used in computer vision and natural language processing and has experienced epoch-making progress. At the same time, deep learning has also been used in the field of biomedicine widely, and the prediction of CPRs based on deep learning has developed rapidly and has achieved good results. The purpose of this study is to investigate and discuss the latest applications of deep learning techniques in CPR prediction. First, we describe the datasets and feature engineering (i.e., compound and protein representations and descriptors) commonly used in CPR prediction methods. Then, we review and classify recent deep learning approaches in CPR prediction. Next, a comprehensive comparison is performed to demonstrate the prediction performance of representative methods on classical datasets. Finally, we discuss the current state of the field, including the existing challenges and our proposed future directions. We believe that this investigation will provide sufficient references and insight for researchers to understand and develop new deep learning methods to enhance CPR predictions.


Assuntos
Aprendizado Profundo , Proteínas , Simulação por Computador , Proteínas/química
18.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34213525

RESUMO

Identifying the frequencies of the drug-side effects is a very important issue in pharmacological studies and drug risk-benefit. However, designing clinical trials to determine the frequencies is usually time consuming and expensive, and most existing methods can only predict the drug-side effect existence or associations, not their frequencies. Inspired by the recent progress of graph neural networks in the recommended system, we develop a novel prediction model for drug-side effect frequencies, using a graph attention network to integrate three different types of features, including the similarity information, known drug-side effect frequency information and word embeddings. In comparison, the few available studies focusing on frequency prediction use only the known drug-side effect frequency scores. One novel approach used in this work first decomposes the feature types in drug-side effect graph to extract different view representation vectors based on three different type features, and then recombines these latent view vectors automatically to obtain unified embeddings for prediction. The proposed method demonstrates high effectiveness in 10-fold cross-validation. The computational results show that the proposed method achieves the best performance in the benchmark dataset, outperforming the state-of-the-art matrix decomposition model. In addition, some ablation experiments and visual analyses are also supplied to illustrate the usefulness of our method for the prediction of the drug-side effect frequencies. The codes of MGPred are available at https://github.com/zhc940702/MGPred and https://zenodo.org/record/4449613.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/diagnóstico , Informática Médica/métodos , Software , Algoritmos , Benchmarking , Bases de Dados Factuais , Aprendizado Profundo , Interações Medicamentosas , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/etiologia , Humanos , Reprodutibilidade dos Testes
19.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33834190

RESUMO

Biomolecular recognition between ligand and protein plays an essential role in drug discovery and development. However, it is extremely time and resource consuming to determine the protein-ligand binding affinity by experiments. At present, many computational methods have been proposed to predict binding affinity, most of which usually require protein 3D structures that are not often available. Therefore, new methods that can fully take advantage of sequence-level features are greatly needed to predict protein-ligand binding affinity and accelerate the drug discovery process. We developed a novel deep learning approach, named DeepDTAF, to predict the protein-ligand binding affinity. DeepDTAF was constructed by integrating local and global contextual features. More specifically, the protein-binding pocket, which possesses some special properties for directly binding the ligand, was firstly used as the local input feature for protein-ligand binding affinity prediction. Furthermore, dilated convolution was used to capture multiscale long-range interactions. We compared DeepDTAF with the recent state-of-art methods and analyzed the effectiveness of different parts of our model, the significant accuracy improvement showed that DeepDTAF was a reliable tool for affinity prediction. The resource codes and data are available at https: //github.com/KailiWang1/DeepDTAF.


Assuntos
Aprendizado Profundo , Modelos Moleculares , Proteínas/química , Proteínas/metabolismo , Sequência de Aminoácidos , Sítios de Ligação , Confiabilidade dos Dados , Descoberta de Drogas/métodos , Ligação de Hidrogênio , Ligantes , Ligação Proteica , Conformação Proteica em alfa-Hélice , Reprodutibilidade dos Testes , Software
20.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33822883

RESUMO

The rapid increase of genome data brought by gene sequencing technologies poses a massive challenge to data processing. To solve the problems caused by enormous data and complex computing requirements, researchers have proposed many methods and tools which can be divided into three types: big data storage, efficient algorithm design and parallel computing. The purpose of this review is to investigate popular parallel programming technologies for genome sequence processing. Three common parallel computing models are introduced according to their hardware architectures, and each of which is classified into two or three types and is further analyzed with their features. Then, the parallel computing for genome sequence processing is discussed with four common applications: genome sequence alignment, single nucleotide polymorphism calling, genome sequence preprocessing, and pattern detection and searching. For each kind of application, its background is firstly introduced, and then a list of tools or algorithms are summarized in the aspects of principle, hardware platform and computing efficiency. The programming model of each hardware and application provides a reference for researchers to choose high-performance computing tools. Finally, we discuss the limitations and future trends of parallel computing technologies.


Assuntos
Processamento Eletrônico de Dados/métodos , Genoma Humano , Genômica/métodos , Polimorfismo de Nucleotídeo Único , Alinhamento de Sequência/métodos , Algoritmos , Sequência de Bases/genética , Mapeamento Cromossômico/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Armazenamento e Recuperação da Informação , Software , Sequenciamento Completo do Genoma/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...